Compute Less to Get More: Using ORC to Improve Sparse Filtering

نویسندگان

  • Johannes Lederer
  • Sergio Guadarrama
چکیده

Sparse Filtering is a popular feature learning algorithm for image classification pipelines. In this paper, we connect the performance of Sparse Filtering in image classification pipelines to spectral properties of the corresponding feature matrices. This connection provides new insights into Sparse Filtering; in particular, it suggests stopping Sparse Filtering early. We therefore introduce the Optimal Roundness Criterion (ORC), a novel stopping criterion for Sparse Filtering. We show that this stopping criterion is related with preprocessing procedures such as Statistical Whitening and that it can make image classification with Sparse Filtering considerably faster and more accurate. Introduction Some typical ways to improve image classification is to collect more annotated data or to change the way the data is represented and process by the model. In practice, however, the number of samples is typically limited by the amount of supervision needed, so many approaches try to transform the data by means of feature learning. Feature learning algorithms transform the data to obtain a more beneficial feature representation where doing the classification becomes easier. Although recently deep learning methods have proposed to jointly learn the feature transformation and the classification (Krizhevsky, Sutskever, and Hinton 2012) for image classification, in this work we are going to focus on unsupervised feature learning, specifically on Sparse Filtering, due to their simplicity and scalability properties. Typical feature learning for image classification pipelines is typically done in three steps: Pre-processing, (un)supervised dictionary learning, and encoding. An abundance of procedures are available for each of these steps, but accurate image classification requires procedures that are effective and interact beneficially with each other (Agarwal and Triggs 2006; Coates and Ng 2011; Coates, Ng, and Lee 2011; Jia, Huang, and Darrell 2012; Le 2013; LeCun, Huang, and Bottou 2004). To ensure accurate results and efficient computations, a profound understanding of these procedures is therefore crucial. In this paper, we study the performance of Sparse Filtering (Ngiam et al. 2011) for image classification. Our main contributions are: • Show that Sparse Filtering can strongly benefit from early stopping; • Show that the performance of Sparse Filtering is correlated with spectral properties of feature matrices on tests sets; • Introduce the Optimal Roundness Criterion (ORC), a stopping criterion for Sparse Filtering based on the above correlation, and demonstrate that the ORC can considerably improve image classification. Feature Learning for Image Classification Feature learning algorithms (1) often consist of two steps: In a first step, a dictionary is learnt, and in a second step, the samples are encoded based on this dictionary. A typical dictionary learning step for image classification is sketched in Figure 1: First, random patches (samples) are extracted from the training images. These patches are then pre-processed using, for example, Statistical Whitening or Contrast Normalization. Finally, an unsupervised learning algorithm is applied to learn a dictionary from the pre-processed patches. Once a dictionary is learnt, several further steps need to be applied to finally train an image classifier, see, for example, (Coates and Ng 2011; Coates, Ng, and Lee 2011; Jia, Huang, and Darrell 2012; Le 2013). Our pipeline is similar to the one in (Coates and Ng 2011): We extract square patches comprising 9 × 9 pixels, pre-process them with Contrast Normalization1 and/or Statistical Whitening, and finally pass them to Random Patches or Sparse Filtering. (Note that our outcomes differ slightly from those in (Coates and Ng 2011) because we use square patches comprising 9×9 pixels instead of 6×6 pixels.) Subsequently, we apply soft-thresholding for encoding, 4×4 spatial max pooling for extracting features from the training data images, and finally L2 SVM classification (cf. (Coates and Ng 2011)). Feature learning has been found to considerably improve classification in numerous examples. Insight in the underlying principles of feature learning algorithms such as Statistical Whitening and Sparse Filtering is therefore of great interest. Which consists of subtracting the mean and dividing by the standard deviation of the pixel values. ar X iv :1 40 9. 46 89 v1 [ cs .C V ] 1 6 Se p 20 14 Training images Extraction of random patches Pre-processing Unsupervised learning Dictionary Figure 1: A typical dictionary learning step. Statistical Whitening and Contrast Normalization are examples for preprocessing procedures; Random Patches and Sparse Filtering are examples for unsupervised learning procedures. In mathematical terms, a feature learning algorithm provides a transformation F : Rl×p → Rn×p X 7→ F(X) (1) of an original feature matrix X ∈ Rl×p to a new feature matrix F(X) ∈ Rn×p. We adopt the convention that the rows of the matrices correspond to the features, the columns to the samples; this convention implies in particular that l ∈ N is the number of original features, p ∈ N the number of samples, and n ∈ N the number of new features. The Optimal Roundness Criterion Roundness of feature matrices Feature learning can be seen as trade-off between reducing the correlations of the feature representation and preservation of relevant information. This trade-off can be readily understood looking at Statistical Whitening. For this, recall that Statistical Whitening pre-processing transforms a set of image patches represented by XPatch into a new set of patches represented by FPatch(XPatch) by changing the local correlation structure. More precisely, Statistical Whitening transforms patches XPatch ∈ R ′×p (n′ < n), that is, subsets of the entire feature matrix, into new patches FPatch(XPatch) such that FPatch(XPatch)FPatch(XPatch) = n′ In′ . Statistical Whitening therefore acts locally: While the correlation structures of the single patches are directly and radically changed, the structure of the entire matrix is affected only indirectly. However, these indirect effects on the entire matrix are important for the following. To capture these effects, we therefore introduce the roundness of a feature matrix F := F(X) given an original feature matrix X . On a high level, we say that the new feature matrix F is round if the spectrum of the associated Gram matrixFF ∈ Rn×n is narrow. To specify this notion, we denote the ordered eigenvalues of FF by σ1(F ) ≥ · · · ≥ σn(F ) ≥ 0 and their mean by σ(F ) := 1 n ∑n i=1 σi(F ) and define roundness as follows: Definition 1. For any matrix F 6= 0, we define its roundness as

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Total Ratio of Vegetation Index (TRVI) for Shrubs Sparse Cover Delineating in Open Woodland

Persian juniper and Pistachio are grown in low density in the rangelands of North-East of Iran. These rangelands are populated by evergreen conifers, which are widespread and present at low-density and sparse shrub of pistachio in Iran, that are not only environmentally but also genetically essential as seed sources for pistachio improvement in orchards. Rangelands offer excellent opportunities...

متن کامل

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...

متن کامل

Effect of Rating Time for Cold Start Problem in Collaborative Filtering

Cold start is one of the main challenges in recommender systems. Solving sparsechallenge of cold start users is hard. More cold start users and items are new. Sine many general methods for recommender systems has over fittingon cold start users and items, so recommendation to new users and items is important and hard duty. In this work to overcome sparse problem, we present a new method for rec...

متن کامل

یک سامانه توصیه‎گر ترکیبی با استفاده از اعتماد و خوشه‎بندی دوجهته به‎منظور افزایش کارایی پالایش‎گروهی

In the present era, the amount of information grows exponentially. So, finding the required information among the mass of information has become a major challenge. The success of e-commerce systems and online business transactions depend greatly on the effective design of products recommender mechanism. Providing high quality recommendations is important for e-commerce systems to assist users i...

متن کامل

A Novel Trust Computation Method Based on User Ratings to Improve the Recommendation

Today, the trust has turned into one of the most beneficial solutions to improve recommender systems, especially in the collaborative filtering method. However, trust statements suffer from a number of shortcomings, including the trust statements sparsity, users' inability to express explicit trust for other users in most of the existing applications, etc. Thus to overcome these problems, this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015